9 research outputs found

    Overview of MV-HEVC prediction structures for light field video

    Get PDF
    Light field video is a promising technology for delivering the required six-degrees-of-freedom for natural content in virtual reality. Already existing multi-view coding (MVC) and multi-view plus depth (MVD) formats, such as MV-HEVC and 3D-HEVC, are the most conventional light field video coding solutions since they can compress video sequences captured simultaneously from multiple camera angles. 3D-HEVC treats a single view as a video sequence and the other sub-aperture views as gray-scale disparity (depth) maps. On the other hand, MV-HEVC treats each view as a separate video sequence, which allows the use of motion compensated algorithms similar to HEVC. While MV-HEVC and 3D-HEVC provide similar results, MV-HEVC does not require any disparity maps to be readily available, and it has a more straightforward implementation since it only uses syntax elements rather than additional prediction tools for inter-view prediction. However, there are many degrees of freedom in choosing an appropriate structure and it is currently still unknown which one is optimal for a given set of application requirements. In this work, various prediction structures for MV-HEVC are implemented and tested. The findings reveal the trade-off between compression gains, distortion and random access capabilities in MVHEVC light field video coding. The results give an overview of the most optimal solutions developed in the context of this work, and prediction structure algorithms proposed in state-of-the-art literature. This overview provides a useful benchmark for future development of light field video coding solutions

    Real-time low-complexity digital video stabilization in the compressed domain

    Get PDF

    Random access prediction structures for light field video coding with MV-HEVC

    Get PDF
    Computational imaging and light field technology promise to deliver the required six-degrees-of-freedom for natural scenes in virtual reality. Already existing extensions of standardized video coding formats, such as multi-view coding and multi-view plus depth, are the most conventional light field video coding solutions at the moment. The latest multi-view coding format, which is a direct extension of the high efficiency video coding (HEVC) standard, is called multi-view HEVC (or MV-HEVC). MV-HEVC treats each light field view as a separate video sequence, and uses syntax elements similar to standard HEVC for exploiting redundancies between neighboring views. To achieve this, inter-view and temporal prediction schemes are deployed with the aim to find the most optimal trade-off between coding performance and reconstruction quality. The number of possible prediction structures is unlimited and many of them are proposed in the literature. Although some of them are efficient in terms of compression ratio, they complicate random access due to the dependencies on previously decoded pixels or frames. Random access is an important feature in video delivery, and a crucial requirement in multi-view video coding. In this work, we propose and compare different prediction structures for coding light field video using MV-HEVC with a focus on both compression efficiency and random accessibility. Experiments on three different short-baseline light field video sequences show the trade-off between bit-rate and distortion, as well as the average number of decoded views/frames, necessary for displaying any random frame at any time instance. The findings of this work indicate the most appropriate prediction structure depending on the available bandwidth and the required degree of random access

    Hard real-time, pixel-parallel rendering of light field videos using steered mixture-of-experts

    Get PDF
    Steered Mixture-of-Experts (SMoE) is a novel framework for the approximation, coding, and description of image modalities such as light field images and video. The future goal is to arrive at a representation for Six Degrees-of-Freedom (6DoF) image data. Previous research has shown the feasibility of real-time pixel-parallel rendering of static light field images. Each pixel is independently reconstructed by kernels that lay in its vicinity. The number of kernels involved forms the bottleneck on the achievable framerate. The goal of this paper is twofold. Firstly, we introduce pixel-level rendering of light field video, as previous work only rendered static content. Secondly, we investigate rendering using a predefined number of most significant kernels. As such, we can deliver hard real-time constraints by trading off the reconstruction quality

    Light field image and video coding for immersive media

    No full text

    Light field image compression using Versatile Video Coding

    No full text
    Light field cameras (or plenoptic cameras) made it possible to not only capture information about the light intensity in a scene, but also the light direction traveling in space. The resulting light field images added two extra spatial dimensions to conventional 2D images by capturing a scene from different angles. Light field technology is expected to deliver the needed six-degrees-of-freedom in virtual reality, and it is a cornerstone technology in holography. Due to the vast amount of data a light field needs to capture for representing high-resolution images, the need of new coding solutions has been considered. While coding standards aiming at plenoptic images are something to be expected in the near future, video coding standards are one of the most efficient solutions to compress light fields. There are countless scanning topologies for restructuring a light field image into a 2D video, and similarly, numerous prediction structures to be used for inter-prediction. In this paper, we provide a comparison of light field image coding performance using the three latest generations of video compression standards. Moreover, we assess a set of coding structures aiming for optimal compression rates as well as random access capabilities. Results show the necessary trade-off between bandwidth and potential random access between light field views at decoding level. Bitrate savings up to 25% were achieved at certain cases where the proposed prediction structure is used over a common scheme used in the literature. The findings of this overview provide a useful benchmark for future development of light field image coding solutions using video coding standards

    Highly parallel steered mixture-of-experts rendering at pixel-level for image and light field data

    No full text
    A novel image approximation framework called steered mixture-of-experts (SMoE) was recently presented. SMoE has multiple applications in coding, scale-conversion, and general processing of image modalities. In particular, it has strong potential for coding and streaming higher dimensional image modalities that are necessary to leverage full translational and rotational freedom (6 degrees-of-freedom) in virtual reality for camera captured images. In this paper, we analyze the rendering performance of SMoE for 2D images and 4D light fields. Two different GPU implementations that parallelize the SMoE regression step at pixel-level are presented, including experimental evaluations based on rendering performance and quality. In this paper it is shown that on appropriate hardware, an OpenCL implementation can achieve 85 fps and 22 fps for, respectively, 1080p and 4K renderings of large models with more than 100,000 of Gaussian kernels
    corecore